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Abstract 

We consider the problem of approximating the unknown density u G L 2 (fi, A) of 
a measure /i on f] C M n , absolutely continuous with respect to some given reference 
measure A, from the only knowledge of finitely many moments of \±. Given d G N 
and moments of order d, we provide a polynomial pd which minimizes the mean 
square error f(u — p) 2 d\ over all polynomials p of degree at most d. If there is no 
additional requirement, pd is obtained as solution of a linear system. In addition, 
if pd is expressed in the basis of polynomials that are orthonormal with respect to 
A, its vector of coefficients is just the vector of given moments and no computation 
is needed. Moreover pd — > u in L 2 (fi,A) as d oc. In general nonnegativity of 
Pd is not guaranteed even though u is nonnegative. However, with this additional 
nonnegativity requirement one obtains analogous results but computing pd > that 
minimizes J(u — p) 2 d\ now requires solving an appropriate semideflnite program. 
We have tested the approach on some applications arising from the reconstruction 
of geometrical objects and the approximation of solutions of nonlinear differential 
equations. In all cases our results are significantly better than those obtained with 
the maximum entropy technique for estimating u. 

Keywords: Moment problems; density estimation; inverse problems; semideflnite pro- 
gramming. 
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1 Introduction 



Estimating the density u of an unknown measure \i is a well-known problem in statistical 
analysis, physics or engineering. In a statistical context, one is usually given observations 
in the form of a sample of independent or dependent identically distributed random vari- 
ables obtained from the unknown measure \i. And so there has been extensive research 
on estimating the density based on these observations. For instance, in one of the most 
popular approaches, the kernel density estimation [25J, the density u is estimated via a 
linear combination of kernel functions - each of them being identified with exactly one 
observation. The crucial step in this method is to choose an appropriate bandwidth for 
which minimizing the integrated or the mean-integrated squared error between u and its 
estimate is most common. Another very popular approach uses wavelets [16j [6j [30], an 
example of approximating a density by a truncated orthonormal series. The coefficients in 
the truncated wavelet expansion are moment estimates derived from the given identically 
distributed observations. Again, the approximation accuracy is often measured by the 
mean-integrated squared error and depends on the number of observations and the degree 
of the truncation. This approach provides a global density estimate satisfying both good 
local and periodic approximation properties. For further details the interested reader is 
referred to [HJ [30] and the many references therein. 

In another context - arising in challenging fields such as image recognition, solving nonlin- 
ear differential equations, spectral estimation or speech processing - no direct observation 
is available, but rather finitely many moments of the unknown measure \± are given. Then 
the issue is to reconstruct or approximate the density u based on the only knowledge of 
finitely many moments, (say up to order d G N), an inverse problem from moments. A 
simple method due to [29] approximates the density u by a polynomial p of degree at most 
rf, so that the moments of the measure pdX matches those of /i, up to order d. However, 
and in contrast with more sophisticated approaches, the resulting polynomial approxi- 
mation p is not guaranteed to be a density (even though u is) as it may takes negative 
values on the domain of integration. One classical approach to the moment problem is 
the Pade approximation [3] which is based on approximating the measure by a (finite) 
linear combination of Dirac measures. The Dirac measures and their weights in the de- 
composition are determined by solving a nonlinear system of equations. In the maximum 
entropy estimation (another classical approach) one selects the best approximation of u 
by maximizing some functional entropy, the most popular being the Boltzmann-Shannon 
entropy. In general some type of weak convergence takes place as the degree increases 
as detailed in [5]. Alternatively the norm of the approximate density is chosen as an 
objective function [H EH ESI E], which allows to show a stronger convergence in norm. 
In [2T] . maximum entropy and Pade approximates have been compared on some numer- 
ical experiments. Finally, piecewise polynomial spline based approaches have also been 
proposed in [T4] . 

Motivation. Our main motivation to study the (inverse) moment problem arises in 
the context of the so-called generalized problem of moments (GPM). The abstract GPM 
is a infinite-dimensional linear program on some space of Borel measures on W 1 and 
its applications seem endless, see e.g. [T71 HE] and the many references therein. For 
instance, to cite a few applications, the GPM framework can be used to help solve a weak 
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formulation of some ordinary or partial differential equations, as well as some calculus of 
variations and optimal control problems. The solution u of the original problem (or an 
appropriate translate) is interpreted as a density with respect to the Lebesgue measure 
A on some domain and one computes (or approximates) finitely many moments of the 
measure d\± := udX by solving an appropriate finite-dimensional optimization problem. 
But then to recover an approximate solution of the original problem one has to solve an 
inverse problem from moments. This approach is particularly attractive when the data 
of the original problem consist of polynomials and basic semi- algebraic sets. In this case 
one may define a hierarchy (as the number of moments increases) of so-called semidefinite 
programs to compute approximations of increasing quality. 

Contribution 

In this paper we consider the following inverse problem from moments: Let \i be a finite 
Borel measure absolutely continuous with respect to some reference measure A on a box 
Q of MJ 1 and whose density u is assumed to be in L 2 (fi, A), with no continuity assumption 
as in previous works. The ultimate goal is to compute an approximation Ud of based 
on the only knowledge of finitely many moments (say up to order d) of ji. In addition, 
for consistency, it would be highly desirable to also obtain some "convergence" Ud —> u as 
d —> oo. 

(a) Firstly, we approximate the density u by a polynomial u* d of degree d which minimizes 
the mean squared error f Q (u—p) 2 d\ (or equivalently the L 2 (fi, A)-norm \\u — p\\l) over all 
polynomials p of degree at most d. We show that an unconstrained L 2 -norm minimizer u d 
exists, is unique, and coincides with the simple polynomial approximation due to [29]; it 
can be determined by solving a system of linear equations. It turns out that u* d matches 
all moments up to degree d ) and it is even easier to compute if it is expressed in the 
basis of polynomials that are orthonormal with respect to A. No inversion is needed and 
the coefficients of u d in such a basis are just the given moments. Moreover we show 
that u*i —> u in L 2 (^, A) as d oo, which is the best we can hope for in general since 
there is no continuity assumption on u] in particular u* d — >■ u almost-e very where and 
almost-uniformly on Q for some subsequence (d*.), fcGN. Even though both proofs are 
rather straightforward, to the best of our knowledge it has not been pointed out before 
that not only this mean squared error estimate u d is much easier to compute than the 
corresponding maximum entropy estimate, but it also converges to u as d oo in a much 
stronger sense. For the univariate case, in references [27] and [2] the authors address the 
problem of approximating a continuous density on a compact interval by polynomials or 
kernel density functions that match a fixed number of moments. In this case, convergence 
in supremum norm is obtained when the number of moments increases. An extension to 
the noncompact (Stieltjes) case is carried out in [8]. Notice that in [27] it was already 
observed that the resulting polynomial approximation also minimizes the mean square 
error and its coefficients solve a linear system of equations. In [H [28] the minimum-norm 
solution (and not the minimum distance solution) is shown to be unique solution of a 
system of linear equations. In [15] the minimal distance solution is considered but it is 
obtained as the solution of a constrained optimization problem and requires an initial 
guess for the density estimate. 
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(b) However, as already mentioned and unlike the maximum entropy estimate, the above 
unconstrained L 2 -norm minimizer u d may not be a density as it may take negative values 
on Q. Of course, the nonnegative function u d+ := max[0, u d ] also converges to u in L 2 but 
it is not a polynomial anymore. So we next propose to obtain a nonnegative polynomial 
approximation u d by minimizing the same L 2 -norm criterion but now under the additional 
constraint that the candidate polynomial approximations should be nonnegative on Q. 
In principle such a constraint is difficult to handle which probably explains why it has 
been ignored in previous works. Fortunately, if Q is a compact basic semi-algebraic 
set one is able to enforce this positivity constraint by using Putinar's Positivstellensatz 
[26] which provides a nice positivity certificate for polynomials strictly positive on Q. 
Importantly, the resulting optimization problem is convex and even more, a semidefinite 
programming (SDP) problem which can be solved efficiently by public domain solvers 
based on interior-point algorithms. Moreover, again as in the unconstrained case, we 
prove the convergence u d —> u in L 2 (^,A) as d — >• oc (and so almost-everywhere and 
almost-uniform convergence on Q, as well for some subsequence ), k G N) which is far 
stronger than the weak convergence obtained for the maximum entropy estimate. Notice, 
in [29j[27] methods for obtaining some non-negative estimates are discussed, however these 
estimates do not satisfy the same properties in terms of mean-square error minimization 
and convergence as in the unconstrained case. In the kernel density element method 
[21 [8] a nonnegative density estimate for the univariate case is obtained by solving a 
constrained convex quadratic optimization problem. However, requiring each coefficient 
in the representation to be nonnegative as presented there seems more restrictive than 
the nonnegative polynomial approximation proposed in this paper. 

(c) Our approach is illustrated on some challenging applications. In the first set of prob- 
lems we are concerned with recovering the shape of geometrical objects whereas in the 
second set of problems we approximate solutions of nonlinear differential equations. More- 
over, we demonstrate the potential of this approach for approximating densities with jump 
discontinuities, which is harder to achieve than for the smooth, univariate functions dis- 
cussed in [27] . The resulting L 2 -approximations clearly outperform the maximum entropy 
estimates with respect to both running time and pointwise approximation accuracy. More- 
over, our approach is able to handle sets Q more complicated than a box (as long as the 
moments of the measure udX are available) as support for the unknown density, whereas 
such sets are a challenge for computing maximum entropy estimates because integrals of 
a combination of polynomials and exponentials of polynomials must be computed repeat- 
edly. 

Outline of the paper 

In Section [2] we introduce the notation and we state the problem to be solved. In Section 
[3] we present our approach to approximate an unknown density u by a polynomial u d of 
degree at most d via unconstrained and constrained L 2 -norm minimization, respectively; 
in both cases we also prove the convergence u* d u in L 2 (fi, A) (and almost-uniform con- 
vergence on Q as well for some subsequence) as d increases. In Section [I] we illustrate the 
approach on a number of examples - most notably from recovering geometric objects and 
approximating solutions of nonlinear differential equations - and highlight its advantages 
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when compared with the maximum entropy estimation. Finally, we discuss methods to 
improve the stability of our approach by considering orthogonal bases for the functions 
spaces we use to approximate the density. And we discuss the limits of approximating 
discontinuous functions by smooth functions in connection with the well-known Gibbs 
effect. 

2 Notation, definitions and preliminaries 
2.1 Notation and definitions 

Let R[x] (resp. M[x]^) denote the ring of real polynomials in the variables x = (#i, . . . , x n ) 
(resp. polynomials of degree at most d) ) whereas E[x] (resp. S[x]^) denotes its subset of 
sums of squares (SOS) polynomials (resp. SOS of degree at most 2d). 

With (lcK n and a given reference measure A on let L 2 (f], A) be the space of functions 
on Q whose square is A-integrable and let Zq_(f2, A) C L 2 (f],A) be the convex cone of 
nonnegative elements. Let C(Q) (resp. C+(fi)) be the space of continuous functions 
(resp. continuous nonnegative functions) on Q. Let P(fl) be the space of polynomials 
nonnegative on Q. 

For every a G N n the notation x. a stands for the monomial x^ 1 • • • x^ n and for every d G N, 
let := {a G N n : £\ a j < 4 whose cardinal is s(d) = A polynomial / G R[x] 

is written 

x->/(x) = f^ a 

a£N n 

and / can be identified with its vector of coefficients f = (f a ) in the canonical basis (x a ), 
a G N n . Denote by S n the space of real n x n symmetric matrices, and by §™ the cone 
of positive semidefinite elements of S> n . For any A G the notation A >z stands for 
positive semidefinite. A real sequence y = (y a ), a G N n , has a representing measure if 
there exists some finite Borel measure \i on W 1 such that 

y a = J x a rf//(x), Va G N n . 

Linear functional 

Given a real sequence y = (y a ) define the Riesz linear functional L y : IR[x] —> R by: 

/(=£ ^ ^ L y^ = £/«!/«, / e R[X]. 

a a 

Moment matrix 

Given d G N, the moment matrix of order d associated with a sequence y = (y a ), a G N n , 
is the real symmetric matrix M^(y) with rows and columns indexed by NJ, and whose 
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entry (a, ft) is y a +/3i for every a, ft G NJ. If y has a representing measure /i then M^(y) >z 
because 

(f,M d (y)f) = JfdfjL >0, Vf GR«» 
Localizing matrix 

With y as above and g G K[x] (with g(x) = ^ 7 g 7 x 7 ), the localizing matrix of order d 
associated with y and g is the real symmetric matrix M^(^y) with rows and columns 
indexed by N^, and whose entry (a, ft) is ^ 7 fl^a+z^py), f° r every a, ft G NJ. If y has 
a representing measure /i whose support is contained in the set {x : g{x) > 0} then 
M^(^y) y because 

<f,M d (</y)f> = [ fgdfi>0, Vf GR*» 



2.2 Problem statement 

We consider the following setting. For Q C W 1 compact, let \i and A be a-finite Borel 
measures supported on Q. Assume that the moments of A are known and /i is absolutely 
continuous with respect to A (/x <C A) with Radon-Nikodym derivative (or density) u : 
Q —> R+, with respect to A. The density ?x is unknown but we know finitely many moments 
y = (Va) of that is, 

y a := / x a 2i(x)rfA(x) = / x a rf/x(x), Va G NJ, (1) 

for some d G N. 

The issue is to find an estimate u d : f2 — >• K+ for u, such that 

/ x*u d (x)dA(x) = y a , Va G NJ. (2) 



2.3 Maximum entropy estimation 

We briefly describe the maximum entropy method due to [T21 [131 E] as a reference for 
later comparison with the mean squared error approach. 

If one chooses the Boltzmann-Shannon entropy H(u) := — ulogu, the resulting estimate 
with maximum-entropy is an optimal solution of the optimization problem 



max / H{u d )d\ s.t. / x. a u d (x.)d\(x.) = y a , Va G N d . 
u d Jn Jn 

It turns out that an optimal solution is of the form 



x i — y ^(x) := exp u a x° 

Aa\<d 
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for some vector = (u a ) £ M s( ^. Hence, computing an optimal solution u* d reduces to 
solving the finite-dimensional convex optimization problem 

max I (y, u d ) - exp V « Q x a dA(x) > (3) 
Ud€Rs(d) \ J" Vlfe / J 

where y = (y a ) is the given moment information on the unknown density u. If (t^), 
d G N, is a sequence of optimal solutions to ([3]), then following weak convergence occurs: 

lim / ^(x)^(x) rfA(x) = / -0(x)w(x)dA(x), (4) 

for all bounded measurable functions tj) : Q — » R continuous almost everywhere. For more 
details the interested reader is referred to [5] . 

Since the estimate u* d is an exponential of a polynomial, it is guaranteed to be nonnegative 
on Q and so it is a density However, even though the problem is convex it remains hard 
to solve because in first or second-order optimization algorithms, computing the gradient 
or Hessian at a current iterate = (u a ) requires evaluating integrals of the form 



/ x a exp 2_ \ w a x ° rfA(x), 
Ja \H< d J 



aeN n d 

which is a difficult task in general, except perhaps in small dimension n = 1 or 2. 

3 The mean squared error approach 

In this section we assume that the unknown density u is an element of L 2 (f], A), and we 
introduce our mean squared error, or L 2 -norm approach, for density approximation. 

3.1 Density approximation as an unconstrained problem 

We now restrict Ud to be a polynomial, i.e. 



x h-> u d (x) := ^ u a 

\a\<d 



x a 



for some vector of coefficients u = (u a ) G We first show how to obtain a polynomial 

estimate u* d G M[x] d of u satisfying ^ by solving an unconstrained optimization problem. 

Let z denote the sequence of moments of A on i.e., z = (z a ), a G N n , with 



and let M^(z) denote the moment matrix of order d of A. This matrix is easily computed 
since the moments of A are known. 
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Consider the unconstrained optimization problem 



min \\u — Ud\\o (= / (u — Ud) 2 d\). (5) 
u*mx]d V Jn J 

Proposition 1 Let Q C W 1 have a nonempty interior and let X(0) > for some open 
set O C Q. A polynomial u d G K[x]d an optimal solution of problem |5p z/ and on/y z/ 
z£s vector of coefficients u d G is an optimal solution of the unconstrained quadratic 

optimization problem 

min {ujM d (z)u d -2uJy}. (6) 
Then u d := M^ 1 (z)y is the unique solution of |^) ; and u d G K[x]^ satisfies: 

[ x a u* d dX = y a = [ * a ud\ VaG N^. (7) 



Proof: Observe that for every u d G M[x]^ with vector of coefficients G 

/ (u — u d ) 2 d\ = / u d dX — 2 I u d udX + u 2 dX 
Jn Jn Jn^ — v — ' 7^ 



D5(d) 



ujM d (z)u (i -2uJy+ / 2i 2 rfA. 



The third term on the right handside being constant, it does not affect the optimization 
and can be ignored. Thus, the first claim follows. 

The second claims follows from the well-known optimality conditions for unconstrained, 
convex quadratic programs and the fact that M^(z) is nonsingular because M^(z) >- 
for all d G N. Indeed, if q T M^(z)q = for some / q G then necessarily the 

polynomial q G R[x]d with coefficient vector q vanishes on the open set O, which implies 
that g = 0, in contradiction with 0. 

Finally, let e a G be the vector of coefficients associated with the monomial x a , 

a G N d . from Md(z)uJ = y we deduce 



y a = e^M,(z)u* = / x a u* d d\ 

Jn 



which is the desired result. □ 

Thus the polynomial u d G R[x]d minimizing the L 2 -norm distance to u coincides with 
the polynomial approximation due to [29] defined to be a polynomial which satisfies all 
conditions Note that this is not the case anymore if one uses an ZAnorm distance 
with p > 2. 

Next, we obtain the following convergence result for the sequence of minimizers of problem 
(§, deN. 

Proposition 2 Let Q be compact with nonempty interior and let X be finite with A(O) > 
for some open set O Cfl. Let (u d ), d G N, be the sequence of minimizers of problem |5p. 
Then \\u — u d \\ 2 as d — >■ oo. In particular there is a subsequence (d k ), G N ; such 
that u dk —> u, X-almost everywhere and X-almost uniformly on as k —> oo. 



Proof: Since Q is compact, R[x] is dense in L 2 (fi, A). Hence as i/, G L 2 (Q, A) there exists a 
sequence (vk) C K[x], fcGN, with ||^ — ^^Ib — ^ as A; — ^ oc. Observe that if dk = deg Vk 
then as u* dk solves problem ([5]), it holds that \\u — v^\\2 > ll^ - u d k W 2 ^ or a ^ ^' which 
combined with \\u — Vk\\2 yields the desired result. The last statement follows from 
PQ Theorem 2.5.2 and 2.5.3]. □ 

Note that computing the L 2 norm minimizer u d is equivalent to solving a system of 
linear equations, whereas computing the maximum entropy estimate requires solving the 
potentially hard convex optimization problem Moreover, the L 2 -convergence \\u d — 
tx||2 — ^ in Proposition [2] (and so almost-everywhere and almost-uniform convergence 
for a subsequence) is much stronger than the weak convergence Q. On the other hand, 
unlike the maximum entropy estimate, the L 2 -approximation u d G K[x]d is not guaranteed 
to be nonnegative on f2, hence it is not necessarily a density. Methods to overcome this 



shortcoming are discussed in £3.2 



Remark 1 In general the support K := supp ji of /i may be a strict subset of Cl = supp A. 
In the case where K is not known or its geometry is complicated, one chooses a set Q D K 
with a simple geometry so that moments of A are computed easily. As demonstrated on 
numerical examples in Section [4j choosing an enclosing frame Q as tight as possible is 
crucial for reducing the pointwise approximation error of u d when the degree d is fixed. 
In the maximum entropy method of £2.3 no enclosing frame Q D K is chosen. In the 
L 2 -approach, choosing Q D K is a degree of freedom that sometimes can be exploited for 
a fine tuning of the approximation accuracy. 



Remark 2 From the beginning we have considered a setting where an exact truncated 
moment vector y of the unknown density u > is given. However, usually one only 
has an approximate moment vector y and in fact, it may even happen that y is not 
the moment vector of a (nonnegative) measure. In the latter case, the maximum of 
the convex problem ^ is unbounded, whereas the L 2 -norm approach always yields a 
polynomial estimate u d G M[x]^. 



Remark 3 If y is a slightly perturbed version of y, the resulting numerical error in Ud 
and side effects caused by ill conditioning may be reduced by considering the regularized 
problem 

min / (u — Ud) 2 d\ + e\\ud\\ 2 (8) 

where ||u^|| 2 is the Euclidean norm of the coefficient vector G of Ud G M[x]d, and 
e > (fixed) is a regularization parameter approximately of the same order as the noise 
in y. The coefficient vector of an optimal solution u d (e) of ^ is then given by 

u* d (e) = (M d + eiy 1 y. (9) 

The effect of small perturbations in y on the pointwise approximation accuracy of u* d for u 
is demonstrated on some numerical examples in Section [4} However, a more detailed anal- 
ysis of the sensitivity of our approach for noise or errors in the given moment information 
is beyond the scope of this paper. 
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3.2 Density approximation as a constrained optimization prob- 
lem 

As we have just mentioned, the minimizer u* d of problem ^ is not guaranteed to yield 
a nonnegative approximation even if u > on Q. As we next see, the function x \-> 
u* d+ (x) := max[0, t^(x)] also converges to u for the L 2 -norm but 

• it does not satisfy the moments constraints (i.e., does not match the moments of 
dji = udX up to order d); 

• it is not a polynomial anymore (but a piecewise polynomial). 

In the sequel, we address the second point by approximating the density with a poly- 
nomial nonnegative on fi, which for practical purposes, is easier to manipulate than a 
piecewise polynomial. We do not address explicitly the first point, which is not as cru- 
cial in our opinion. Note however that at the price of increasing its degree and adding 
linear constraints, the resulting polynomial approximation may match an (a priori) fixed 
number of moments. 

Adding a polynomial nonnegativity constraint to problem ^ yields the constrained op- 
timization problem: 

min {H'U — Ud\\l '• Ud>0 on fi} (10) 

u d eR[x] d 

which, despite convexity, is untractable in general. We consider two alternative opti- 
mization problems to enforce nonnegativity of the approximation; the first one considers 
necessary conditions of positivity whereas the second one considers sufficient conditions 
for positivity. 



Necessary conditions of positivity 

Consider the optimization problem: 

min {\\u-u d \\ 2 2 : M d (u d z) ^0}, (11) 

where z = (z a ) is the moment sequence of A and M^(^z) is the localizing matrix associ- 



ated with Ud and z. Observe that problem (11) is a convex optimization problem because 



the objective function is convex quadratic and the feasible set is defined by linear matrix 
inequalities (LMIs). 

The rationale behind the semidefiniteness constraint M^(^z) ^ in (11) follows from a 
result in [19] which states that if suppA = Q and M^(^z) > for all d then u d > on 

n. 

Lemma 1 If Q is compact then P(£V) is dense in L 2 (Q, A) + with respect to the L 2 -norm. 

Proof: As Vt is compact the polynomials are dense in L 2 (f],A). Hence there exists a 
sequence (u d ) C !R[x], d E N, such that \\u — Ud\\2 ~^ as d —> oc. But then the sequence 
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d G N, with ^J(x) := max[0, t^(x)] for all x G f2, also converges for the L 2 -norm. 
Indeed, 



(u — Ud) 2 d\ — I (u — UdfdX + / 

JQ «/Qn{x:it,/(x)<0} JQf 



(u — Ud) dX 

^n{x:u d (x)<0} J Qn{x:u d {x)>0} 



> / u 2 dX+ / (u-u d ) 2 dX 

Jnn{x:u d (x)<o} J nn{x:u d (*)>o} 

(u — u~^) 2 dX \\u — Uj\\l. 

Jn 

So let k G N and dj, G N be such that \\u — Ud k \\2 < k~ l and so \\u — t^Jb < 
As is continuous and Q is compact, by the Stone- Weierstrass theorem there exists 
a sequence (vd k i) C M[x], f GN, that converges to for the supremum norm. Hence 
su Pxeft \ u d k ~ v d k i\ < for all £ > £k (for some Therefore, the polynomial := 
^<44 + * s positive on Q and 

Ik-^Jb < Ih-^ilb+ll^i -wd k £ k h 

< k ~ X + hd k ~ V d k i k h + IK4 - w d k £ k H2 

V v ' ' v ' 

<fc-U(n) 1 /2 =fc-iA(n)i/2 

< fc- 1 + 2fc- 1 A(fi) 1/2 . 

Therefore we have found a sequence (wd k £ k ) C fc G N, such that — ^d fc € fc H2 ~~ ^ 

as A; 00. □ 



Proposition 3 Let Q be compact with nonempty interior. Then problem (11) has an 
optimal solution u* d for every d G N, and \\u — ?x^|| 2 — >> as d ^ 00. 

Proof: Fix d and consider a minimizing sequence (ue) C R[x]d with \\u — mono- 
tonically decreasing and converging to a given value as £ —> 00. We have ||t^|| 2 < 
1Mb + ||^ ~~ ^Ib < IMI2 + ||^ _ U0W2 for all f GN. Therefore as || • || 2 defines a norm on 
the finite dimensional space R[x]d (fi has nonempty interior) the whole sequence (ui) is 
contained in the ball {v : \\v\\2 < IMI2+ ||^ _ ^0 1 1 2} - As the feasible set is closed, problem 



(11) has an optimal solution. 



Let (u*l) C M[x] be a sequence of optimal solutions of problem (11). By Lemma[TJ P(Q) is 
dense in L 2 (Q) + . Thus there exists a sequence (vk) C M[x]> , fcGN, with ||m — ^|| 2 — » 
as k —> 00. As Vk > on then necessarily M^(^ z) >: for all and all fc. In particular 
Md(vjfe d z) >: where fc^ := max{/c : deg^ < rf}. Therefore ^ d G K[x]^ is a feasible 

12 \ |L, „,*||2 



solution of problem (11) which yields \\u — ^/cj| 2 > 11^ — ^11 2 fo r a ^ ^- Combining with 



^ — v k d \\2 — >• yields the desired result. □ 



Via sufficient conditions of positivity 

Let bd(x) := (1, £1, . . . , x n , xf, Xix 2 , . . . , x^) T denote the standard monomial basis of 
R[x]d- Let Q be a basic compact semi-algebraic set defined by Q = {x G M n | #j(x) > 
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0, j — 1, . . . , m} for some polynomials gj G M[x], j = 1, . . . ,m with <ij = [~(deg <7j)/2] . 
Then, with d > maxj dj, consider the optimization problem: 



mm \\u 

u d m[x\ d 



u d\\l 



s.t. u d (x) = b d (x) T Aob d (x) + J2T =1 bd-dj (x) T A i b d _ (ij (x)^ (x), 



(12) 



A G§f , Aj e §' 



s(d—dj) 



3 = 1, 



, m. 



Since the equality constraints are linear in and the entries of Aj, j = 0, . . . ,m, the 



feasible set of (12) is a convex LMI set. Moreover, the objective function is convex 
quadratic. 



Note that whereas the semidefinite constraint M^(^z) ^ in (11) was a relaxation of 
the nonnegativity constraint ^ > on 0, in ( fl2| ) a feasible solution Ud is necessarily 
nonnegative on Q because the LMI constraint on Ud is a Putinar certificate of positivity 



on Q. However, in problem (12) we need to introduce m + 1 auxiliary matrix variables 



Aj, whereas (11) is an optimization problem in the original coefficient vector and does 



not require such a lifting. Thus, problem (11) is computationally easier to handle than 



problem (12), although both are convex SDPs, which are substantially harder to solve 
than problem ([6]). 



Proposition 4 Let Q be compact with nonempty interior. For every d > maxj dj, prob- 
lem (12) has an optimal solution u d and \\u — u* d \\\ — >> as d — >> oo. 



Proof: With d > maxj dj fixed, consider a minimizing sequence (u d ) C K[x] d for problem 
(12). As in the proof of Proposition [3] one has ||t^||i < ||^||2 < 1Mb + ||^ — u o\\2- As u* d 



is feasible, 



* 



(x) = b4x) T Ajb d (x) + Y,^d- d M) TA -%-dM)9i^), 

3 = 1 



for some real symmetric matrices A 1 - y 0, j = 0, . . . , m. Rewriting this as 



u. 



|(x) = (A^,b,(x)b (i (x) T ) + 5](A J d ,b^(x)b (i _^(x) T ), 

3 = 1 



\u* d \\i < a, 



and integrating with respect to A yields 

(A^M d (z)) + ^(Aj,M d _ d .( ft z)) = u* d d\< 

3 = 1 Jsi 

with a := ||^||i < 1 1 ^ 1 1 2 + ||^ ~~ u o\\2- Hence, for every rf, 

(A , M d (z)) < a and (Aj, M d _ dj (g 3 z)) < a, j = 1, . . . , m. 

As Mrf(z) >- 0, Nld-djigj z) >- 0, and Aj >: 0, j = l,...,m, we conclude that all 
matrices Aj are bounded. Therefore the minimizing sequence (u^ (Aj)) belongs to a 
closed bounded set and as the mapping v H> \\u — v\\\ is continuous, an optimal solution 
exists. 
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From Lemma [I] there exists (vk) C M[x]> , k e N, such that ||m — Vk\\\ — >* as — »> oc. 
Using properties of norms, — ^||2 — < ||?/ — + ^ _1 )||2 < ||^ — ^/clb + and 
so \\u — (vk + ^ _1 )||2 — y as /c — y oc. Moreover, as Vk + k~ l is strictly positive on by 
Putinar's Positivstellensatz [26], there exists d& such that 



v fe (x) = b dfc (x) T A b dfe (x) + J^b (ifc _ (ij (x) T A J b dfc _ (ij (x)^(x), Vx, 

for some real matrices Aj y 0, j — 0, . . . , to. So letting = max[deg^, d^], the 
polynomial ^ + is a feasible solution of problem (12) whenever d > and with value 
||^ — (vk + ^ _1 )||2 > Hence ||?/ — v d +\\l — >■ as — >• oc, and by monotonicity of the 
sequence (||w — ^Hl), d G N, the result follows. □ 

Remark 4 Since is compact, Proposition 3] and [4] imply that minimizers of the two 
constrained L 2 norm minimization problems ( Tlj ) and (12) converge almost uniformly to 
the unknown density u as in the unconstrained case. 

Remark 5 Our approach can handle quite general sets Q and K as support and frame 
for the unknown measure in the unconstrained and constrained cases, the only restriction 
being that (i) Q and K are basic compact semi- algebraic sets, and (ii) one can compute 
all moments of A on Q. In contrast, to solve problem ^ by local minimization algorithms 
using gradient and possibly Hessian information, integrals of the type 

x a exp yjupx? dA(x) 

n V p J 

must be evaluated. Such evaluations may be difficult as soon as n > 3. In particular 
in higher dimensions, cubature formulas for approximating such integrals are difficult to 
obtain if Q is not a box or a simplex. 



4 Numerical experiments 



In this section, we demonstrate the potential of our method on a range of examples. We 
measure the approximation error between a density u and its estimate Ud by the average 
error 

£d '= / \u(x) — Ud{x)\d\{x) 
and the maximum pointwise error 

id := max \u(x) — Ud{x)\. 

In some examples we also consider e° d and e° d) the respective errors on particular segments 
of the interior of Q. We compare the performance of our approach to the maximum 



entropy estimation from £2.3 Both methods are encoded in Matlab. We implemented 
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the mean squared error (L 2 ) minimization approach for the standard monomial basis, 
which results in solving a linear system in the unconstrained case. In the constrained 
case we apply SeDuMi to solve the resulting SDP problem. In both cases, the numerical 
stability can be improved by using an orthogonal basis (such as Legendre or Chebychev 
polynomials) for M[x\. In order to solve the unconstrained, concave optimization problem 
([3]), we apply the Matlab Optimization Toolbox command fminunc as a black-box solver. 
The observed performance of the maximum entropy estimation (MEE) method may be 
improved when applying more specialized software. 




00 00 



Figure 1: Degree 10 approximation of the density from exact moment vector (left) vs. 
perturbed moment vector (right). 



Example 1 First we consider the problem of retrieving the density u(x) = x\ + x 2 on 
[0, l] 2 given its moments y a = / fi x a w(x)dx = {ai+1 ) (a2+2) + ( ai+2 )(a 2 +i) > whlch was con ~ 
sidered as a test case in JEflJ . This example is a priori very favorable for our technique 
since the desired u is a polynomial itself When solving problem |5p for d G {3, 5, 10} 
we do obtain the correct solution u d = (0, 1, 1, 0, . . .) T in less than 0.1 sees. Thus, the 
pointwise approximation is much better than the one achieved in [241- Moreover, u* d > 
without adding LMI constraints. 

The question arises of how the polynomial approximation behaves if the moment vector 
contains some noise, i.e. if it does not exactly coincide with the moment vector of the 
desired u. We solve problem |5p for y := y + e where the maximal relative componentwise 
perturbation e between y and y is less than 3%. The pointwise error between ^i (y) and 
the solution for the perturbed moment vector u* Q (y) ^ s sufficiently small, as pictured in 
Figure [7} 

Example 2 Next we consider recovering the function u : [—1, 1] — )> R with u{x) = \x\ as 
a first example of a nondifferentiable function. In a first step we solve problem |5p for the 
exact moment vector with entries yk := J_ 1 \x\x dx = fe v +2 ; , k = 0, . . . , d corresponding 
to the density u. The resulting estimates u* d for d G {20, 30, 50} provide a highly accurate 
pointwise approximation of u on the entire domain, as reported in Table [7] and pictured 
in Figure^ (left). 
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d 


problem 






20 


( 


5 


) 


0.0031 


0.0296 


30 


( 


5 


) 


0.0022 


0.0265 


50 


( 


5 


) 


0.0021 


0.0251 



Table 1: Estimating the density u(x) = \x\ from an exact moment vector. 



d 


problem 


Q 




7° 


F° 
e d 


10 


( 


5 


) 


0.0252 


0.2810 


0.0207 


0.1358 


20 


( 


5 


) 


0.0244 


0.7934 


0.0142 


0.0952 


30 
30 


( 


5 

12 


) 

) 


0.0237 
0.0176 


1.0956 
0.4705 


0.0112 
0.0106 


0.1024 
0.0937 


50 
50 


( 


I 

12 


) 

) 


0.0236 
0.0206 


1.4591 
0.6632 


0.0088 
0.0118 


0.1023 
0.0979 



Table 2: Estimating the density u(x) = \x\ from a perturbed moment vector. 



In a second step we consider a perturbed moment vector y as input. When solving problem 
we observe that both errors q on the interior of [—1,1] - in particular at the 
nondifferentiable point x — - decrease for increasing d, whereas the pointwise error 
increases at the boundary of the domain. Although providing good approximations for u 
on the entire interior of [—1,1], the estimates ul and ul Q take negative values at the 
boundary {—1, 1}. This is circumvented by solving problem (12). The new estimates are 
globally nonnegative while their approximation accuracy is only slightly worse than in the 
unconstrained case, as reported in Table^ and pictured in Figure^ (right). 



Example 3 Consider the functions u\, u 2 : [—1,1] —> Ui(x) = ^(x) = ||— x 2 |^ ; 
which are both not locally Lipschitz. Applying mean squared error minimization for d G 
{20, 50} to the exact moment vectors yields accurate polynomial approximations for both 
functions on their entire domains, even at the boundary and at the points where the 
functions are not locally Lipschitz, cf. Figure [3| 



4.1 Recovering geometric objects 

One of the main applications of density estimation from moments is the shape recon- 
struction of geometric objects in image analysis. There has been extensive research on 
this topic, cf. [29, 20j and the references therein. The reconstruction of geometric ob- 
jects is a particular case of density estimation when u = Ik-> i-e. the desired density 
u is the indicator function of the geometric object K C Vt C K n . Its given moments 
y a = f Q x a 7x(x)(iA(x) = J K x a dA(x) do not depend on the frame However, Q does 
enter when computing the matrix M^(z) in problem ([6]). As indicated in Remark [l] and 
demonstrated below, the choice of the enclosing frame Q for K is crucial for the pointwise 
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Figure 3: Mean squared error minimizers for u\ (left) and u 2 (right) from exact moments. 
Blue: u, green: t^o, cyan: u^q. 



approximation accuracy for a fixed order d. Since u has a special structure, we derive an 
estimate for K by choosing a superlevel set of u* d as proposed in [29j : 

K d := {x G R n |«2(x) > \}. 



Example 4 A first object we recover is shaped like the letter E as in [29]. We determine 
the mean squared error minimizer for the same moment vector y with d G {3,5,8, 10} ; 
but for three different frames fl. As pictured in Figure^ we are able to reconstruct the 
estimates derived in f^Pj/ when Q is chosen tight. Moreover, we observe that this good 
approximation of the severely nonconvex set K and its discontinuous indicator function for 
a small number of moments d depends heavily on the choice for The wider the frame, 
the worse gets the approximation accuracy of the truncated estimate. Applying maximum 
entropy estimation for d G {3, 5, 8} yields density estimates of comparable accuracy than 
mean squared error minimization, cf. Figure [5[ However, the computational effort is 
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Figure 4: Recovering the letter E with mean squared error minimization; tight frames 
(top) vs. loose frames (bottom); original (left) vs. estimates for degrees d G {3,5,8, 10} 
(from left to right). 




Figure 5: Recovering the letter E with maximum entropy estimation; original (left) vs. 
estimates for degrees d G {3, 5, 8} (from left to right). 



much larger: for d G {3, 5, 8} ; the computational times for maximum entropy estimation 
are 82, 902 and 4 108 seconds, respectively, whereas the mean squared error minimizer can 
be determined in less than one second for these values of d. 



Example 5 Secondly we consider recovering an F-shaped set, a less symmetric example 
than the E-shaped set of Example 0. As previously, we observe that the approximation 
accuracy of relies heavily on the frame Vt. Even though the set K has a complicated 
geometry, Kiq approximates K accurately if Q is chosen sufficiently tight, cf Figure [6| 

Example 6 Consider approximating K := {x G M 2 \ x\{x\ — ?>x\) + [x\ + x\) 2 > 0}, 
a nonconvex region enclosed by a trefoil curve. Since this curve is of genus zero, its 
moment vector y can be determined exactly. Again, we need to choose an appropriate 
frame VtZ)K. The results for = £>(0, 1) and d G {3, 5, 8, 10} are pictured in Figure^ 
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Figure 6: Recovering the letter F with mean squared error minimization; tight frames 
(top) vs. loose frames (bottom); original (left) vs. estimates for degrees d G {3,5,8, 10} 
(from left to right). 




Figure 7: Trefoil region K (left) and ((K d \K)U(K\ K d )) for d e {3, 5, 8, 10} (from left 
to right). 



4.2 Approximating solutions of differential equations 

Another important class of moment problems arises from the numerical analysis of ordi- 
nary and partial differential equations. Solutions of certain nonlinear differential equa- 
tions can be understood as densities of measures associated with the particular differential 
equation. We obtain an approximate solution from the moment vector of this measure by 
solving an SDP problem. Approaches for deriving moment vectors associated with solu- 
tions of nonlinear differential equations have been introduced in [221 ITU] and are omitted 
here. We assume that the moment vector of a measure whose density is a solution of the 
respective differential equation is given in the following examples. 



Example 7 Given the moment vectors of a solution (u, v) of the following reaction- 
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Figure 8: Estimates for u (left) and v (right) of the reaction-diffusion equation via max- 
imum entropy estimation (above) and mean squared error minization (below). Blue: u ) 
green: u\ {)) red: i^q, cyan: magenta: u^. 



diffusion equation [23]: 

^u" + \ (35 + 16^ - u-uv = 0, 

4v" - (l + §v) v + uv = 0, 

u'(Q) = u\h) = v\0) = v'(5) = 0, 

< u,v < 14 on [0,5], 

we apply both maximum entropy estimation and mean squared error minimization to ap- 
proximate the desired solution. For the numerical results, see Table [3] and Figure [S[ We 
observe that the mean squared error minimizers provide accurate pointwise approximations 
for (u,v) on the entire domain, whereas the maximum entropy estimates provides a fairly 
accurate pointwise approximation on some segment of the domain only. Moreover, the 
mean squared error minimizers are obtained extremely fast as solutions of linear systems 
compared with the maximum entropy estimates. Thus, in this example, mean squared er- 
ror minimization is clearly superior to maximum entropy when estimating densities from 
moments. 



Example 8 Given the moment vector of the nontrivial, positive solution of the Allen- 
Cahn bifurcation PDF: 

u xx + u yy + 22^(1 — u 2 ) = on [0, l] 2 , 

u = on<9[0,l] 2 , (13) 

< u < 1 on [0, l] 2 , 
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d 

Lb 




11±C 111UU 






f°. 


f°. 


timp ( cipp i 


10 


U 


MEE 


2.0536 


4.1063 


2.1679 


4.1063 


38 


30 


u 


MEE 


1.7066 


3.4775 


1.7974 


3.4775 


2 


50 


u 


MEE 


1.6978 


3.3858 


1.7792 


3.3858 


820 


10 


u 


L 2 


0.4611 


1.4506 


0.4619 


1.4506 


0.1 


30 


u 


L 2 


0.3942 


1.2431 


0.4101 


1.2431 


0.1 


50 


u 


L 2 


0.3941 


1.2173 


0.4004 


1.2173 


0.1 


10 


V 


MEE 


0.5782 


2.0982 


0.5205 


1.0806 


221 


30 


V 


MEE 


0.5978 


9.6060 


0.4592 


1.2356 


645 


50 


V 


MEE 


0.6853 


23.5993 


0.4117 


1.2973 


3306 


10 


V 


L 2 


0.3429 


5.4767 


0.1765 


0.5617 


0.1 


30 


V 


L 2 


0.3286 


12.4501 


0.1024 


0.6253 


0.1 


50 


V 


L 2 


0.3744 


14.2223 


0.1454 


0.5907 


0.1 



Table 3: Approximation accuracy of maximum entropy estimation (MEE) and mean 
squared error (L 2 ) optima for the reaction-diffusion equation. 



d 


method 




7° 


time (sec.) 


4 


MEE 


1.3e-2 


l.le-2 


566 


6 


MEE 


9.7e-3 


8.9e-3 


2489 


4 


Li 1 


1.8e-3 


1.4e-3 


0.1 


6 


L 2 


4.6e-4 


2.8e-4 


0.1 


10 


L 2 


5.5e-4 


2.1e-4 


0.1 


12 


L 2 


5.7e-4 


2.1e-4 


0.1 



Table 4: Approximation accuracy of maximum entropy estimation (MEE) and mean 
squared error (L 2 ) optima for the Allen-Cahn bifurcation PDE. 
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Figure 9: Estimates for solution of the Allen-Cahn bifuraction PDE by maximum entropy 
estimation (above) and mean squared error minization (below). Exact solution (left) vs 
estimates for d G {4, 6} and d G {4, 6, 10, 12}, respectively. 



we apply both maximum entropy and mean squared error minization. The numerical re- 
sults are reported in Table^ The approximation accuracy for both methods is comparable, 
with mean squared error minimization being slightly more precise. However, applying max- 
imum entropy estimation is limited for this problem as the cases d > 6 are numerically too 
heavy to be solved in reasonable time. Mean squared error minimization yields increasingly 
better estimates for the desired solution within seconds, as pictured in Figure [5[ 



Example 9 Given the moment vector of the viscosity solution of the classical eikonal 



d 


method 


£d 




time (sec.) 


3 


MEE 


5.4e-2 


5.1e-2 


20 


6 


MEE 


1.9e-2 


1.9e-2 


2192 


3 


L 2 


2.8e-2 


2.6e-2 


0.1 


6 


L 2 


2.8e-2 


2.6e-2 


0.1 


10 


L 2 


1.4e-2 


1.4e-2 


0.2 



Table 5: Approximation accuracy of maximum entropy estimation (MEE) and mean 
squared error (L 2 ) minization for the eikonal PDE. 
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Figure 10: Estimates for solution (left) of eikonal PDE by maximum entropy estimation 
(center) for d = 6 and mean squared error minimization (right) for d = 10. 



d 


problem 






time (sec.) 


10 
10 


( 


(5 
12 


) 


0.08 
0.11 


0.50 
0.56 


0.1 
0.9 


50 
50 


( 


15 
12 


) 


0.05 
0.08 


0.50 
0.54 


0.1 
2.7 


100 
100 


( 


15 
12 


) 


0.05 
0.07 


0.50 
0.55 


0.3 
37.5 



Table 6: Mean squared error minimization for u(x) = Z[o.5,i](^)- 



PDE: 

u 2 x +u 2 y - 1 = on[0,l] 2 , 

u = on d[0,l] 2 , 1 ; 

we apply both maximum entropy and mean squared error minization. Both methods pro- 
vide better pointwise approximates for increasing degrees d, cf. Table [| and Figure [70[ 
Again, the advantage of using mean squared error minimization is its negligible computa- 
tion time. 



4.3 Approximating indicator functions 



In all examples discussed so far, the polynomial approximates u* d of solution u obtained 
by solving the unconstrained problem ^ have been nonnegative on £1 It was therefore 
not necessary to solve the constrained problems (11) or (12). 



Example 10 As a first simple example where solving one of the constrained problems is 
required, consider u : [0, 1] — >> M>,u(x) = Z[o.5,i](^) and its given vector of moments. We 
solve problems |5p and J71| ) for d G {10,50,100}. As illustrated in Figure 11, solving 
problem (12) provides globally nonnegative estimates for u. This comes at the price of 



decreasing approximation accuracy compared to solving the unconstrained problem, since 
the feasible set of problem (12) is a subset of the feasible set of |3p ; cf. Table^ 
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x 

Figure 11: Unconstrained and constrained mean squared error minimization estimates. 
Blue: u ) green: u{ Q unconstrained, red: u{ Q constrained, cyan: u\ m unconstrained, ma- 
genta: u[ m constrained. 

5 Orthogonal bases and the Gibbs effect 

Density estimation via L 2 norm minimization over the truncated space of polynomials is 
a special case of approximating a function belonging to a complicated functional space 
by a linear combination of basis elements of an easier, well-understood functional space. 
This general setting has been considered for both real and complex functional spaces. In 
the following we discuss the particular case when the basis of the easier functional space 
is orthogonal, and also effects resulting from truncating the infinite dimensional series 
expansion of the unknown function. 



5.1 The complex case 

The classic, complex analogue of the real, inverse moment problem discussed in the pre- 
vious sections is the problem of approximating a periodic function u by a trigonometric 
polynomial. An orthonormal basis for the space of trigonometric polynomials is given by 
(e~ lkx )kez- It provides the Fourier series expansion: 

u(x) = ^2y k e ikx , 

where Vk = ^ ^_^u[x)e~ lkx dx in the univariate case, or 

u(x, y) = y k ,ie ikx e il y, 
k,iez 
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where yk,i — ^2 Jl^ /T^ lkx e dy dx dy in the bivariate case. One obtains a trigono- 

metric polynomial approximation for u when truncating this series expansion at some 
deN, 

d 

M x ) = ykelkx - 

k=-d 

It is a well-known fact in Fourier theory that 

\2 



(u — Ud) dx for d —> 00. 

The Fourier approximation for a periodic function is therefore the trigonometric analogue 
of the real polynomial approximation for a function obtained by mean squared error 
minimization in our approach. As in the real case we have almost uniform convergence. 
However, Ud does not converge to u uniformly if u piece wise continuously differentiate 
with jump discount inuities, due to an effect known as the Gibbs phenomenon. It states 
that the truncated approximation shows a near constant overshoot and undershoot near 
a jump discontinuity. This overshoot does not vanish, it only moves closer to the jump 
for increasing d. 

Since the truncated Fourier series is the trigonometric analogue of unconstrained mean 
squared norm minimizer of problem ([5]), the question arises of how the trigonometric 
estimate behaves for a periodic functions when adding nonnegativity constraints as in 



problems (11) and (12) which aim at preventing the estimate from over- or undershooting 
near jump discontinuities. 

In order to derive a tractable SDP problem, the nonnegativity constraints for the trigono- 
metric polynomial need to be relaxed or tightened to LMI constraints. The difference 
with the real case will be the moment and localizing matrices being of Toeplitz type in 



contrast with the Hankel type matrices in the constraints of problem (11) 



5.2 The real case 

In our discussions in the previous sections we approximated an unknown density by a 
linear combination of elements of the monomial basis of the space of real polynomials. 
This choice of a basis for W[x] has several theoretical and practical shortcomings. For 
once, it is not an orthogonal basis with respect to the Lebesgue measure, and moreover the 
moment matrix M^(z) in problem ([5]), whose inverse needs to be computed to determine 
Ufa is severely ill-conditioned [U [29] . As already pointed out in [29], when choosing 
an orthogonal basis such as Legendre or Chebychev polynomials for the L 2 norm 

minimizer can be determined according to a closed- form formula. Thus, we do not 
even have to solve a linear system of equations. This is essentially the real analogue of 
the closed form formula for the coefficients in the Fourier series approximation of periodic 
functions. 



In §5.1| we discussed the Gibbs phenomenon. For reasons outlined there, we expect to 
observe this effect in the real case as well, when approximating an L 2 integrable func- 



tion with jump discontinuities. In fact, the function from Example 10 illustrates that 
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As shown in Figure [TTJ we observe an over- and undershoot on both sides of the dis- 
continuity. The amplitude of this overshoot does not decrease for increasing rf, but the 
overshoot moves closer to the jump. When adding the global nonnegativity constraint 
for the polynomial estimate, the undershoot at the left side disappears. However, it is 
compensated for by a weaker, overall pointwise approximation accuracy of the estimate. 
This observation is a first, partial answer to the question raised in the complex case. 

6 Conclusion 

We introduced an approach for estimating the density of a measure given a finite number 
of its moments only, in the multivarite case and with no continuity assumption on the 
density. As an estimate we choose the polynomial minimizing the L 2 norm distance, or 
mean squared error to the unknown density. We have shown that this estimate is easy 
to determine by solving a linear system of equations. Moreover, it converges almost uni- 
formly towards the desired density when the degree increases. By minimizing the mean 
squared error subject to additional linear matrix inequality constraints, which translates 
to solving a semidefinite program, we obtain a density estimate guaranteed to be non- 
negative on the support of the measure. Also in the constrained case, we have shown 
almost uniform convergence of the nonnegative mean squared error minimizer towards 
the unknown density, when the degree increases. Mean squared error minimization is 
often superior to maximum entropy estimation in terms of approximation accuracy and 
computation time, as demonstrated for a number of examples. Moreover, it is able to 
handle general, basic, compact semialgebraic sets as support for the unknown measure, 
which present a challenge to maximum entropy estimation in higher dimension. 
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